Exploring the Arts Council National Portfolio 2018-2022

On 27 June 2017, the Arts Council England announced the National Portfolio Organisations for the next four years. They also provided a factsheet and a dataset!

Now we've explored and had a stab at tidying the dataset, it's time to revisit it systematically.

Author: Edafe Onerhime

Created: 2018-08-04

Last Updated: 2018-08-05

Description: Systematic exploration of the original dataset and the tidy datasets: funding & portfolio.

Contents: Notebook folder with this Jupyter Notebook, data folder, research folder (not sync'd to GitHub)

Notes:

2018-08-05 - Changed direction with this notebook. Initially it was a loose exploration tool, now I'm using it to explore processes and artefacts that speed up systematic data analysis. This notebook will chop and change frequently!



In [1]:

    
%matplotlib inline

import os.path
import pandas as pd
import numpy as np
import missingno as msno

from IPython.core.display import display, HTML
from urllib.request import urlretrieve
from tabulate import tabulate



In [2]:

    
"""
    Making 'global' variables we'll re-use later easier to set
"""
originalFile = {'fname': 'NPO_successful_20182022_July_6.xlsx',
                'url': 'http://www.artscouncil.org.uk/sites/default/files/download-file/NPO_successful_20182022_July_6.xlsx'}

dataFolderOut = os.path.join(os.pardir,'data','output')
dataFolderIn = os.path.join(os.pardir,'data','input')



In [3]:

    
"""
    All the functions we'll need
"""

def customInfo(dfIn):
    """
        Summary text and table based on Dataframe.info()
    """
    
    df = dfIn.copy(deep=True)
    df.df_name = dfIn.df_name
    
    CI = 'The dataset ' + "'" + df.df_name + "'" + \
         ' has ' + str(df.shape[1]) + ' columns and ' + str(df.shape[0]) + ' rows. Columns: '
    s = df.get_dtype_counts()
    for ind, val in s.iteritems():
        CI = CI + str(val) + ' ' + ind +', '            
    CI = CI[:-2]+'.'

    dfCI = pd.DataFrame()
    df.reset_index(inplace=True)
    dfCI['count'] = df.count()
    dfCI['unique'] = df.T.apply(lambda x: x.nunique(), axis=1)
    dfCI['min'] = df.min()
    dfCI['max'] = df.max()
    dfCI['median'] = df.median()
    dfCI['range'] = dfCI['max'].apply(pd.to_numeric, errors='coerce') - dfCI['min'].apply(pd.to_numeric, errors='coerce')
    try:
        dfCI['iqr'] = df.quantile(0.75) - df.quantile(0.25)    
    except:
        dfCI['iqr'] = np.nan
    dfCI.sort_index(inplace=True)
    return dfCI, CI

def portfolioTransform(dfIn, ColumnName, Year, Period, Type):
    """ 
        Returns a dataframe with Portfolio Funding amount, Year e.g. 2015 and Period e.g. 2015-2018
    """
    df = dfIn[[ColumnName]].copy(deep=True)
    df['Year'] = Year
    df['Period'] = Period
    df['Type'] = Type
    df.rename(columns={ColumnName:'Amount'}, inplace=True)
    return df

def describeCategory(dfIn, measure=''):
    """
        Returns an array of dataframes showing count and proportion of each category
        (1 dataframe per category column)
        When a measure is provided, use the measure instead of the count
    """
    
    df = dfIn.copy(deep=True).reset_index()
    dataframes = []
    i = 0
    
    if len(measure)==0:
        for catCol in list(df.select_dtypes(include=['category']).columns):
            dataframes.append(pd.DataFrame(df[catCol].value_counts()).reset_index().rename(columns={catCol: 'Count', 'index': catCol}))
            dataframes[i]['%'] = dataframes[i][['Count']]/df.shape[0]
            dataframes[i].df_name = catCol
            i+= 1
    else:
        for catCol in list(df.select_dtypes(include=['category']).columns):
            dataframes.append(pd.DataFrame(df.reset_index().groupby(catCol)[measure].agg('sum').reset_index()))
            dataframes[i]['%'] = dataframes[i][[measure]]/df[measure].sum() 
            dataframes[i].df_name = catCol
            i+= 1
    return dataframes



In [4]:

    
"""
    Load datasets we'll use later
"""

# Fetch/Load Original
fname = os.path.join(dataFolderIn,originalFile['fname'])
if not os.path.isfile(fname):
    print('Downloading:',fname)
    url = originalFile['url']
    urlretrieve(url, fname)

df = pd.read_excel(open(fname,'rb'), sheetname=0)
df.df_name = 'Original'

"""
    Get summary information
"""
dfInfo, infoText = customInfo(df)

print(infoText)
dfInfo









    



The dataset 'Original' has 22 columns and 844 rows. Columns: 4 float64, 5 int64, 13 object.






    Out[4]:







  
    
      
      count
      unique
      min
      max
      median
      range
      iqr
    
  
  
    
      % Cash change between 17/18 and 18/19
      844
      252
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      ACE Region
      844
      9
      East
      Yorkshire
      NaN
      NaN
      NaN
    
    
      Alternative Name
      111
      108
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      Applicant Name
      844
      831
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      Area
      844
      6
      London
      South West
      NaN
      NaN
      NaN
    
    
      Discipline
      844
      9
      Combined arts
      Visual arts
      NaN
      NaN
      NaN
    
    
      Funding Band
      844
      4
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      Funding Source (GIA or LOT)
      844
      2
      GIA
      LOT
      NaN
      NaN
      NaN
    
    
      Local Authority
      844
      184
      Allerdale
      York
      NaN
      NaN
      NaN
    
    
      Notes
      37
      16
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      ONS Region
      844
      10
      East Midlands
      Yorkshire and The Humber
      NaN
      NaN
      NaN
    
    
      Portfolio funded in 2015-18?
      844
      2
      No
      Yes
      NaN
      NaN
      NaN
    
    
      Portfolio funding  15/16 - £\n
      844
      570
      0
      2.4772e+07
      125389.350598
      24772000.0
      2.548629e+05
    
    
      Portfolio funding  16/17 - £\n
      844
      570
      0
      2.4772e+07
      125389.350598
      24772000.0
      2.548629e+05
    
    
      Portfolio funding  17/18 - £\n
      844
      570
      0
      2.4772e+07
      125389.350598
      24772000.0
      2.548629e+05
    
    
      Portfolio grant 18/19 - £
      844
      632
      40000
      24028840
      181022.000000
      23988840.0
      2.613698e+05
    
    
      Portfolio grant 19/20 - £
      844
      632
      40000
      24028840
      181022.000000
      23988840.0
      2.613698e+05
    
    
      Portfolio grant 20/21 - £
      844
      632
      40000
      24028840
      181022.000000
      23988840.0
      2.613698e+05
    
    
      Portfolio grant 21/22 - £
      844
      632
      40000
      24028840
      181022.000000
      23988840.0
      2.613698e+05
    
    
      TOTAL Portfolio funding  15/18 - £\n
      844
      570
      0
      7.4316e+07
      376168.051793
      74316000.0
      7.645886e+05
    
    
      TOTAL Portfolio grant 18/22 - £
      844
      632
      160000
      96115360
      724088.000000
      95955360.0
      1.045479e+06
    
    
      Web address
      844
      829
      -
      https://www.unicorntheatre.com/
      NaN
      NaN
      NaN
    
    
      index
      844
      844
      0
      843
      421.500000
      843.0
      4.215000e+02



In [5]:

    
"""
    Tidy and enrich the original dataset
"""

# Strip out carriage returns in column names
df = df.rename(columns={col: col.replace('\n','').replace('  ',' ') for col in df.columns})
df.df_name = 'All'

# Strip out text value 'NEW' so column is all numbers
df.loc[df['% Cash change between 17/18 and 18/19'].str.contains('NEW', na=False), '% Cash change between 17/18 and 18/19'] = np.NaN

# Make these columns categories
catCols = ['Funding Band', 'Area', 'ACE Region', 'ONS Region', 'Discipline', 'Funding Source (GIA or LOT)']
df[catCols] = df[catCols].apply(pd.Categorical)

# Make this column true or false (Bool)
df.loc[df['Portfolio funded in 2015-18?'].str.contains('Yes', na=False), 'Portfolio funded in 2015-18?'] = True
df.loc[df['Portfolio funded in 2015-18?'].str.contains('No', na=False), 'Portfolio funded in 2015-18?'] = False
df['Portfolio funded in 2015-18?'] = df['Portfolio funded in 2015-18?'].astype('bool')

# Make these columns ints
intCols = ['Portfolio funding 15/16 - £', 'Portfolio funding 16/17 - £', 'Portfolio funding 17/18 - £', 
           'TOTAL Portfolio funding 15/18 - £']
df[intCols] = df[intCols].astype('int')

# Enrich: Funding Source (GIA or LOT) using descriptions here: https://www.artsprofessional.co.uk/news/npo-scheme-merge-ace-grant-aid-and-lottery-funding
df['Funding Source Description'] = df['Funding Source (GIA or LOT)'].astype(str)
df.loc[df['Funding Source Description'].str.contains('GIA', na=False), 'Funding Source Description'] = 'Grant in Aid'
df.loc[df['Funding Source Description'].str.contains('LOT', na=False), 'Funding Source Description'] = 'National Lottery'

# Enrich: Funding Bands using descriptions here: http://www.artscouncil.org.uk/sites/default/files/download-file/Briefing_current_NPOs_MPMs_FINAL.pptx
df['Funding Band Description'] = df['Funding Band'].astype(str)
df.loc[~df['Funding Band Description'].str.contains('SSO'), 'Funding Band Description'] = 'Band ' + df['Funding Band Description'].astype(str)
df.loc[df['Funding Band Description'].str.contains('SSO'), 'Funding Band Description'] = 'Sector Support Organisation'

# Pull information out of notes: Bridge Organisation, Uplift, Reduction, Museum Development
df['Bridge Organisation'] = False
df['Technical Uplift'] = False
df['Technical Reduction'] = False
df['Museum Development'] = False
df.loc[df['Notes'].str.lower().str.contains('bridge organisation', na=False), 'Bridge Organisation'] = True
df.loc[df['Notes'].str.lower().str.contains('uplift', na=False), 'Technical Uplift'] = True
df.loc[df['Notes'].str.lower().str.contains('reduction', na=False), 'Technical Reduction'] = True
df.loc[df['Notes'].str.lower().str.contains('museum development', na=False), 'Museum Development'] = True

# Add Index (Looks like Applicant Name & Funding Band are unique)
df.set_index(['Applicant Name', 'Funding Band'], inplace=True)

"""
    Split into Applicant, Classification, Portfolio and save as csv
"""

dfApplicant = df[['Alternative Name', 'Web address', 'Discipline', 'Notes', 'Area', 'ACE Region', 'ONS Region', 'Local Authority']]
dfApplicant.df_name = 'Applicant'

dfClassification = df[['Funding Source (GIA or LOT)', 'Funding Source Description', 'Funding Band Description', 'Portfolio funded in 2015-18?', 'Bridge Organisation', 'Technical Uplift', 'Technical Reduction', 'Museum Development']]
dfClassification.df_name = 'Classification'

dfPortfolio = portfolioTransform(df,'Portfolio funding 15/16 - £','2015','2015-2018','Fund')
dfPortfolio = dfPortfolio.append(portfolioTransform(df,'Portfolio funding 16/17 - £','2016','2015-2018','Fund'))
dfPortfolio = dfPortfolio.append(portfolioTransform(df,'Portfolio funding 17/18 - £','2017','2015-2018','Fund'))
dfPortfolio = dfPortfolio.append(portfolioTransform(df,'Portfolio grant 18/19 - £','2018','2018-2022','Grant'))
dfPortfolio = dfPortfolio.append(portfolioTransform(df,'Portfolio grant 19/20 - £','2019','2018-2022','Grant'))
dfPortfolio = dfPortfolio.append(portfolioTransform(df,'Portfolio grant 20/21 - £','2020','2018-2022','Grant'))
dfPortfolio = dfPortfolio.append(portfolioTransform(df,'Portfolio grant 21/22 - £','2021','2018-2022','Grant'))

# Make Period, Type categories then add Year to index
catCols = ['Year','Period','Type']
dfPortfolio[catCols] = dfPortfolio[catCols].apply(pd.Categorical)
dfPortfolio.set_index('Year', append=True, inplace=True)
dfPortfolio.df_name = 'Portfolio'

# Save csv
filePrefix = 'NPO_2018-22_'
df.to_csv(os.path.join(dataFolderOut,filePrefix+'all.csv'))
dfApplicant.to_csv(os.path.join(dataFolderOut,filePrefix+'applicant.csv'))
dfPortfolio.to_csv(os.path.join(dataFolderOut,filePrefix+'portfolio.csv'))
dfClassification.to_csv(os.path.join(dataFolderOut,filePrefix+'classification.csv'))



In [6]:

    
"""
    Get summary information
"""
dfInfo, infoText = customInfo(df)
print(infoText)
dfInfo, infoText = customInfo(dfApplicant)
print(infoText)
dfInfo, infoText = customInfo(dfPortfolio)
print(infoText)
dfInfo, infoText = customInfo(dfClassification)
print(infoText)









    



The dataset 'All' has 26 columns and 844 rows. Columns: 5 bool, 5 category, 9 int64, 7 object.
The dataset 'Applicant' has 8 columns and 844 rows. Columns: 4 category, 4 object.
The dataset 'Portfolio' has 3 columns and 5908 rows. Columns: 2 category, 1 int64.
The dataset 'Classification' has 8 columns and 844 rows. Columns: 5 bool, 1 category, 2 object.



In [7]:

    
"""
    Landscape: Check for missing values using MissingNo
    https://github.com/ResidentMario/missingno
"""

msno.matrix(df)
msno.heatmap(df)
msno.bar(df)
msno.dendrogram(df)



In [8]:

    
print('\n', 'Applicant')
lstDesc = describeCategory(dfApplicant)
for dfDesc in lstDesc:
    print('\n',dfDesc.df_name)
    display(HTML(tabulate(dfDesc,headers='keys', tablefmt='html', showindex=False, )))
    
print('\n', 'Portfolio')
lstDesc = describeCategory(dfPortfolio)
for dfDesc in lstDesc:
    print('\n',dfDesc.df_name)
    display(HTML(tabulate(dfDesc,headers='keys', tablefmt='html', showindex=False, )))    

print('\n', 'Classification')
lstDesc = describeCategory(dfClassification)
for dfDesc in lstDesc:
    print('\n',dfDesc.df_name)
    display(HTML(tabulate(dfDesc,headers='keys', tablefmt='html', showindex=False, )))









    



 Applicant

 Funding Band






    






Funding Band    Count         %


1                 530 0.627962 
2                 190 0.225118 
3                  66 0.0781991
SSO                58 0.0687204








    



 Discipline






    






Discipline               Count          %


Theatre                    190 0.225118  
Combined arts              187 0.221564  
Visual arts                149 0.17654   
Music                      102 0.120853  
Museums                     72 0.0853081 
Dance                       64 0.0758294 
Literature                  49 0.0580569 
Not discipline specific      24 0.028436  
Libraries                    7 0.00829384








    



 Area






    






Area        Count         %


London        253 0.299763 
North         225 0.266588 
Midlands      129 0.152844 
South West     103 0.122038 
South East     100 0.118483 
National       34 0.0402844








    



 ACE Region






    






ACE Region     Count         %


London           268 0.317536 
South West       104 0.123223 
North West        94 0.111374 
Yorkshire         93 0.11019  
West Midlands      76 0.0900474
South East        57 0.0675355
East Midlands      56 0.0663507
North East        48 0.056872 
East              48 0.056872 








    



 ONS Region






    






ONS Region                Count          %


London                      268 0.317536  
North West                   94 0.111374  
Yorkshire and The Humber      93 0.11019   
South West                   86 0.101896  
South East                   74 0.0876777 
West Midlands                73 0.0864929 
East Midlands                56 0.0663507 
North East                   48 0.056872  
East of England              48 0.056872  
WALES                         4 0.00473934








    



 Portfolio

 Funding Band






    






Funding Band    Count         %


1                3710 0.627962 
2                1330 0.225118 
3                 462 0.0781991
SSO               406 0.0687204








    



 Year






    






  Year   Count        %


  2021     844 0.142857
  2020     844 0.142857
  2019     844 0.142857
  2018     844 0.142857
  2017     844 0.142857
  2016     844 0.142857
  2015     844 0.142857








    



 Period






    






Period     Count        %


2018-2022    3376 0.571429
2015-2018    2532 0.428571








    



 Type






    






Type    Count        %


Grant    3376 0.571429
Fund     2532 0.428571








    



 Classification

 Funding Band






    






Funding Band    Count         %


1                 530 0.627962 
2                 190 0.225118 
3                  66 0.0781991
SSO                58 0.0687204








    



 Funding Source (GIA or LOT)






    






Funding Source (GIA or LOT)    Count        %


GIA                              694 0.822275
LOT                              150 0.177725



In [9]:

    
print('\n', 'Portfolio')
lstDesc = describeCategory(dfPortfolio,'Amount')
for dfDesc in lstDesc:
    print('\n',dfDesc.df_name)
    display(HTML(tabulate(dfDesc,headers='keys', tablefmt='html', showindex=False, )))









    



 Portfolio

 Funding Band






    






Funding Band      Amount         %


1              422929045 0.155876 
2              633424839 0.233457 
3             1520734544 0.560485 
SSO            136157170 0.0501824








    



 Year






    






  Year    Amount        %


  2015 360013986 0.132688
  2016 360013986 0.132688
  2017 360153986 0.132739
  2018 407638276 0.15024 
  2019 408568678 0.150583
  2020 408388276 0.150517
  2021 408468410 0.150546








    



 Period






    






Period       Amount        %


2015-2018 1080181958 0.398114
2018-2022 1633063640 0.601886








    



 Type






    






Type      Amount        %


Fund  1080181958 0.398114
Grant 1633063640 0.601886



In [ ]:

	count	unique	min	max	median	range	iqr
% Cash change between 17/18 and 18/19	844	252	NaN	NaN	NaN	NaN	NaN
ACE Region	844	9	East	Yorkshire	NaN	NaN	NaN
Alternative Name	111	108	NaN	NaN	NaN	NaN	NaN
Applicant Name	844	831	NaN	NaN	NaN	NaN	NaN
Area	844	6	London	South West	NaN	NaN	NaN
Discipline	844	9	Combined arts	Visual arts	NaN	NaN	NaN
Funding Band	844	4	NaN	NaN	NaN	NaN	NaN
Funding Source (GIA or LOT)	844	2	GIA	LOT	NaN	NaN	NaN
Local Authority	844	184	Allerdale	York	NaN	NaN	NaN
Notes	37	16	NaN	NaN	NaN	NaN	NaN
ONS Region	844	10	East Midlands	Yorkshire and The Humber	NaN	NaN	NaN
Portfolio funded in 2015-18?	844	2	No	Yes	NaN	NaN	NaN
Portfolio funding 15/16 - £\n	844	570	0	2.4772e+07	125389.350598	24772000.0	2.548629e+05
Portfolio funding 16/17 - £\n	844	570	0	2.4772e+07	125389.350598	24772000.0	2.548629e+05
Portfolio funding 17/18 - £\n	844	570	0	2.4772e+07	125389.350598	24772000.0	2.548629e+05
Portfolio grant 18/19 - £	844	632	40000	24028840	181022.000000	23988840.0	2.613698e+05
Portfolio grant 19/20 - £	844	632	40000	24028840	181022.000000	23988840.0	2.613698e+05
Portfolio grant 20/21 - £	844	632	40000	24028840	181022.000000	23988840.0	2.613698e+05
Portfolio grant 21/22 - £	844	632	40000	24028840	181022.000000	23988840.0	2.613698e+05
TOTAL Portfolio funding 15/18 - £\n	844	570	0	7.4316e+07	376168.051793	74316000.0	7.645886e+05
TOTAL Portfolio grant 18/22 - £	844	632	160000	96115360	724088.000000	95955360.0	1.045479e+06
Web address	844	829	-	https://www.unicorntheatre.com/	NaN	NaN	NaN
index	844	844	0	843	421.500000	843.0	4.215000e+02

Discipline	Count	%
Theatre	190	0.225118
Combined arts	187	0.221564
Visual arts	149	0.17654
Music	102	0.120853
Museums	72	0.0853081
Dance	64	0.0758294
Literature	49	0.0580569
Not discipline specific	24	0.028436
Libraries	7	0.00829384

Area	Count	%
London	253	0.299763
North	225	0.266588
Midlands	129	0.152844
South West	103	0.122038
South East	100	0.118483
National	34	0.0402844

ACE Region	Count	%
London	268	0.317536
South West	104	0.123223
North West	94	0.111374
Yorkshire	93	0.11019
West Midlands	76	0.0900474
South East	57	0.0675355
East Midlands	56	0.0663507
North East	48	0.056872
East	48	0.056872

Year	Count	%
2021	844	0.142857
2020	844	0.142857
2019	844	0.142857
2018	844	0.142857
2017	844	0.142857
2016	844	0.142857
2015	844	0.142857

Funding Band	Amount	%
1	422929045	0.155876
2	633424839	0.233457
3	1520734544	0.560485
SSO	136157170	0.0501824

Year	Amount	%
2015	360013986	0.132688
2016	360013986	0.132688
2017	360153986	0.132739
2018	407638276	0.15024
2019	408568678	0.150583
2020	408388276	0.150517
2021	408468410	0.150546